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We  present  a  new  parallel  algorithm  for  computing  a  least-squares  solution 
to  a  sparse  overdetermined  system  of  linear  equations  Ax=b  such  that  mXn 

matrix  A  is  sparse  and  the  graph,  G  =  (V,E),  of  the  matrix  H  =  f\  ]  has 


an  s(m+n)-separator  family,  that  is,  by  deleting  a  separator  subset  S  of  vertices 
of  the  size  <  s(m+n),  G  can  be  partitioned  into  two  disconnected  subgraphs  hav¬ 
ing  vertex  sets  Vi,V2  of  the  sizes  <  2/3  (m+n)  and  each  of  the  two  resulting  sub¬ 
graphs  induced  by  the  vertex  sets  S  (J  V£  i=l,2,  can  be  recursively 


s(  |  S  U  Vj  |  )-separated  in  a  similar  way.  —Our  algorithm  uses 
0(Iog  (m+n)  log-s(m+n))  steps  and  <  s3(m+n)  processors;  it  relies  on  our  recent 


parallel  algorithm  for  solving  sparse  linear  systems,  and  has  several  immediate 
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applications  of  substantial  interest,  in  particular  to  mathematical  programming, 
to  sparse  nonsymmetric  systems  of  linear  equations,  and  to  the  path  algebra  com¬ 
putation.  We  most  closely  examine  the  impact  on  the  linear  programming  prob¬ 
lem,  which  requires  to  maximize  cTy  subject  to  ATy  <  b,  y  >  0  where  A  is  an 
mXn  matrix.  Hereafter  it  is  assumed  that  m  >  n.  The  recent  algorithm  by  N. 
Karmarkar  gives  the  best  known  upper  estimate,  (0(m35  L)  arithmetic  operations 
where  L  is  the  input  size),  for  the  cost  of  the  solution  of  this  problem  in  the  worst 
case.  We  prove  an  asymptotic  improvement  of  that  result  in  the  case  where  the 
graph  of  the  associated  matrix  H  has  an  s(m+n)-separator  family;  then  our  algo¬ 
rithm  can  be  implemented  using  0(m  L  log  m  log2s(m+n))  parallel  arithmetic- 
steps,  s3(m+n)  processors  and  a  total  of  0(m  L  s3(m+n)  log  m  log2s(m+n))  arith¬ 
metic  operations.  In  many  cases  of  practical  importance  this  is  a  considerable 
improvement  of  the  known  estimates:  for  example,  s(m-l-n)  =  vS  m+n  if  G  is 
planar,  (as  occurs  in  many  operations  research  applications,  for  instance,  in  the 
problem  of  computing  the  maximum  multicommodity  flow  with  a  bounded 
number  of  commodities  in  a  network  having  an  s(m+n)-separator  family),  so  that 
the  processor  bound  is  only  8  \/8  (m+ n)1-5  and  the  total  number  of  arithmetic- 
steps  is  0((m+n)25L)  in  that  case.  Similarly  Karmarkar’s  algorithm  and  the 
known  algorithms  for  the  solution  of  overdetermined  linear  systems  are 
accelerated  in  the  case  of  dense  input  matrices  via  our  recent  parallel  algorithms 
for  the  inversion  of  dense  nXn  matrices  using  0(log2n)  steps,  n3  processors. 
Combined  with  Karmarkar’s  algorithm,  this  implies  0(L(m+n)  log2(m+n))  steps. 
■n3  processors.  The  stated  results  promise  some  important  practical  applications. 
Theoretically  the  above  processor  bounds  can  be  reduced  to  o( n ~ s )  and 
o((m+n)2-5)  in  the  dense  case  and  to  o(s2-5(m+n))  in  the  sparse  case  (supporting 
the  same  number  of  parallel  steps). 
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1.  Introduction 

Numerous  practical  computations  require  to  find  a  least  squares  solution  to 
an  overdetermined  system  of  linear  equations,  Ax  =  b,  that  is,  to  find  a  vector  x 
of  dimension  n  that  minimizes  ||Ax  -  b||  given  an  mXn  matrix  A  and  a  vector  b 
of  dimension  m  where  m  >  n.  (Here  and  hereafter  we  apply  the  Euclidean  vec¬ 
tor  norm  and  the  associated  2-norm  of  matrices,  [GL].)  Such  a  problem  is  called 
the  linear  least  squares  problem,  l.l.s.p.  In  particular  solving  a  linear  system 
Ax  =  b  in  the  usual  sense  is  a  simplification  of  the  l.l.s.p.  where  the  output  is 
either  the  answer  that  min  ||Ax  —  b||  >  0  or  otherwise  a  vector  x*  such  that 

X 

Ax*  -  b  =  0. 

The  objective  of  this  paper  is  to  reexamine  the  time-complexity  of  the  l.l.s.p. 
and  to  indicate  the  possibility  of  speeding  up  its  solution  using  the  parallel  algo¬ 
rithms  of  [PRf.  As  a  major  consequence,  (which  may  become  decisive  for  deter¬ 
mining  the  best  algorithm  for  the  linear  programming  problem  (l.p.p.),  at  least 

over  some  important  classes  of  instances  of  that  problem,  see  Appendix),  we  will 

y 

substantially  speed  up  Karmarkar’s  algorithm ,  (KJ,  for  the  l.p.p.  because  solving 
the  l.l.s.p.  constitutes  the  most  costly  part  of  every  iteration  of  that  algorithm. 
Our  acceleration  of  Karmarkar's  is  most  significant  in  the  practically  important 
case,  (arising,  for  instance,  in  the  multicommodity  flow  problem  in  a  planar  net¬ 
work  for  a  fixed  number  of  commodities,  see  (GM),  [L] ) ,  where  the  input  matrix 
of  l.p.p.  is  large  and  sparse  and  is  associated  with  graphs  having  small  separators 
(see  the  formal  definition  below,  in  sect.  3).  On  the  other  hand,  our  work  has 
several  further  impacts.  Similarly  to  the  case  of  the  algorithm  of  [KJ.  we  may 
immediately  improve  the  performance  of  several  known  algorithms,  in  particular 
of  the  algorithms  for  systems  of  linear  inequalities,  [1*84,85],  for  mathematical 
programming,  [S],  and  for  sparse  nonsymmetric  systems  of  linear  equations  for, 
(as  we  indicated  above),  solving  a  system  of  linear  equations  constitutes  a 
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particular  case  of  the  l.l.s.p.  where  min  ||Ax  -  b||  =  0).  The  latter  observation 

X 

leads  to /St1  very  wide  range  of  applications  tJTow  results,  including  in  particular 
the  acceleration  of  the  simplex  algorithms,  see  f°r  sparse  l.p.  problems. 

Further  applications  may  include  several  combinatorial  computations.  This  is 
demonstrated  in  [PRa]-  where,  relying  on  the  latter  improvement  of  the  algo¬ 
rithms  for  sparse  nonsymmetric  systems  of  linear  equations,  we  extend  the  paral¬ 
lel  nested  dissection  algorithm  of  [PR]  to  the  path  algebra  computations. 

We  organize  the  paper  as  follows.  "In  the  next  section  we  recall  two  known 
representations  of  the  l.l.s.p.  using  normal  equations.  In  sect.  3  we  reexamine 
the  computational  cost  of  sequential  algorithms  for  l.l.s.p.*  in  particular,  we  recall 
the  sequential  nested  dissection  algorithm  of  [LRT]  and  adjust  it  to  the  case  of 
l.l.s.p.  In  sect.  4  we  estimate  the  cost  of  performing  our  parallel  algorithm  for  the 
same  problem.  In  sect.  5  we  consider  one  of  the  major  applications  of  our  results, 
that  is,  to  the  acceleration  of  Karmarkar’s  algorithm.  In  Appendix  we  will  briefly 
comment  on  the  current  estimates  for  the  computational  cost  of  solving  the  l.p.p. 

2.  Some  Equivalent  Representations  of  the  Linear  Least  Squares  Prob¬ 
lem 

We  will  use  the  known  fact,  (see  [GL]),  that  the  l.l.s.p.  can  be  reduced  to 
computing  solution  x  to  the  system  of  normal  linear  equations 

ATAx  =  ATb,  (1) 

which  can  be  equivalently  rewritten  as  the  following  system  of  linear  equations  in 
r  and  x, 

H(r,x)T  =  (b,0)T  (2) 

where 


(3) 
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/?  is  a  nonzero  constant,  see  [B],  p.  182.  Here  and  hereafter  I,  \VT,vT,  O  and  0 
denote  the  identity  matrix,  the  transposes  of  a  matrix  W  and  of  a  vector  v,  the 
null  matrix  and  the  null  vector,  respectively. 

If  we  need  to  solve  the  system  Ax  =  b  in  the  usual  sense,  then  that  system 
can  be  equivalently  rewritten  as  CAx  =  Cb  for  any  nonsingular  matrix  C,  so  we 
may  generalize  (2),  (3)  as  follows, 

H(C|(r,i)T  =  (Cb,0)T,  H(C)  =  *  jc 

Here  X,  p  are  constants,  not  necessarily  nonzero.  For  given  A  and  b,  we  may 
vary  8,  X,  //  and  C  in  (3)  and  (4)  in  order  to  simplify  the  problem  or  to  improve 
the  stability  of  its  solution  (obtained,  say  by  the  nested  dissection  algorithm* 
considered  in  the  next  sections). 

3.  The  Sequential  Computational  Complexity  of  the  Linear  Least 
Squares  Problem 

For  a  1.1. s.p.  with  a  dense  matrix  A,  its  solution  can  be  obtained  from  (1) 
using  0(m/n)M(n)  arithmetic  operations  where  M(n)  is  the  cost  of  nXn  matrix 
multiplication.  M(n)  <  2n3-n2.  Theoretically  M(n)  =  o(n‘24fl6)  but  that  bound  is 
not  practical  due  to  the  huge  overhead  constants  hidden  in  that  “o'\  [1*8 In] . 

If  the  matrix  A  is  sparse,  the  solution  can  be  accelerated  using  some  special 
methods,  see  [B] .  In  particular  applying  the  conjugate  gradient  method  or  the 
Lanczos  method,  (see  [13],  [CJL]),  we  may  reduce  the  cost  of  solving  both  the  sys¬ 
tem  (1)  and  (consequently)  a  1.1. s.p.  at  least  to  0(n  N(A))  arithmetic  operations 
where  N(A)  is  the  number  of  nonzero  entries  of  A,  provided  that  the  multiplica¬ 
tion  by  0  and  the  addition  of  0  are  cost-free  operations. 

We  will  single  out  a  more  specific  case  encountered  in  many  practical 
instances  of  the  1.1. s.p.,  that  is,  in  the  instances  where  the  matrix  A  is  sparse  and 
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furtbermore  the  graph  G=(V,E)  associated  with  the  matrix  H  has  an  s(m-fn)- 
separator  family  where  s(m+n)=o(m+n).  Here  and  hereafter  we  follow  the 
recursive  Definition  1.1  of  sect.  1.2  of  [PR],  (compare  also  [LRT]),  saying  that  G 
has  an  s(m+n)-separator  family  if  G  can  be  partioned  into  two  disconnected  sub¬ 
graphs  with  the  vertex  sets  Vj  and  V2  of  the  sizes  <  a  |  V  |  ,  (for  a  constant 
o  <  1),  by  deleting  a  separator  subset  S  of  vertices  of  the  size  <  s(m-t-n)  and  if 
recursively  each  of  the  two  subgraphs  of  G  induced  by  the  vertex  sets  S  U  Vj. 
i=  1,2,  has  an  s(  |  S  U  V;  |  )-separator  family. 

In  that  case  the  application  of  the  techniques  of  nested  dissection,  (see  [G], 
[LRT],  [B],  p.  182),  decreases  the  cost  of  the  solution  of  (2),  and  consequently  of 
the  original  l.I.s.p.,  to  0(  |  E  |  +M(s(m-t-n)))  arithmetic  operations  where  |E|  is  the 
cardinality  of  the  edge  set  of  G,  [LRT].  This  is  the  cost  of  computing  the  LDLT- 
factorization  of  H.  The  subsequent  evaluation  of  the  vectors  r,x  satisfying  (2) 
costs  only  0(  |  E  |  -t-(s(m+n))2)  arithmetic  operations  so  the  approach  is  particu¬ 
larly  effective  where  several  systems  (1)  with  fixed  A  and  variable  b  must  be 
solved. 

4.  Acceleration  of  the  Solution  of  Linear  Least  Squares  Problem  Using 
Parallel  Algorithms. 

For  large  input  matrices  A  the  sequential  algorithms  for  the  l.I.s.p.  can  be 
prohibitively  slow.  Their  drastic  acceleration  that  fully  preserves  their  efficiency 
can  be  obtained  using  the  recent  parallel  algorithms  of  [PR]  where  in  each  step 
'every  processor  may  perform  one  arithmetic  operation.  (Note  that  we  need  fewer 
processors  where  we  agree  to  use  more  steps.)  Specifically,  applying  the  matrix 
inversion  algorithm  of  [PR],  we  solve  the  system  (1)  in  O(log2n)  parallel  steps 
using  M(n)/log  n  processors  and  consequently  we  solve  the  original  l.I.s.p.  using 
0(log  m -I- log2n)  steps,  (M(n)/log  n)(l  +  m/(n  log  n))  processors.  These  are  the 
bounds  in  the  case  where  A  is  a  general  (dense)  matrix.  If  A  is  sparse  and  if  the 
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graph  G  =  (V,E)  of  H  has  an  s(m+n)'separator  family,  then  the  parallel  nested 
dissection  algorithm  of  [PR]  computes  a  special  recursive  s(m+n)-faetorization  of 
the  matrix  H  of  (2), (3)  using  0(log  m  log2  s(m+n))  parallel  steps  and 
|  E  |  +M(s(m+n))/log  s(m+n)  processors.  Following  Definition  4.1  of  [PR],  we 
define  such  a  recursive  s(m+n)-factorization  of  H  as  a  sequence  of  matrices 
H0,H,,...,Hd  such  that  H0  =  PHPT,  P  is  an  nXn  permutation  matrix. 


Hs  = 


Xs  YST 

Y_  Z„ 


r-i 


\  =  Hg+,  +  YrXg 


Y. 


(B) 


for  g=0,l,...,d-l,  and  Xg  is  a  block-diagonal  matrix  consisting  of  square  blocks  of 
sizes  at  most  s(ad~h(m+n))  X  s(Qd_h(m+n))  where  Qd(m+n)  <  c  for  a  constant  c. 
The  factorization  (6)  has  length  d  =  0(log  m),  so  its  computation  is  reduced  to 
0(log  m)  steps  of  matrix  multiplication  and  inversion  versus  m+n  such  steps  of 
the  sequential  nested  dissection  algorithms.  Although  such  a  recursive  factoriza¬ 
tion  (6)  is  distinct  from  the  more  customary  factorization  used  in  the  sequential 
algorithms,  both  have  similar  power,  that  is,  when  the  recursive  factorization  (6) 
is  available,  0( ( log  m)(log  s(m+n)))  parallel  steps  and  |  E  |  +(s(m+n))2  processors 
suffice  in  order  to  solve  the  system  (2)  and  consequently  the  original  l.l.s.p. 


Comparing  the  cost  bounds  of  [LRT]  and  [PR]  we  can  see  that  the  paralleli¬ 
zation  is  fully  efficient,  that  is,  the  product  of  the  two  upper  bounds  on  the 
numbers  of  steps  and  processors  of  [PR]  is  equal  (within  a  polylogarithmic  factor) 
to  the  bound  on  the  number  of  arithmetic  operations  in  the  current  best  sequen¬ 
tial  algorithm  of  [LRT]  for  the  same  problem.  The  same  efficiency  criterion  is 
"satisfied  in  the  algorithms  of  [PR]  inverting  an  nXn  dense  matrix  in  ()(log2n) 
parallel  steps  using  M(n)/log  n  processors.  Consequently  all  our  parallel  algo¬ 
rithms  for  a  l.l.s.p.  are  also  fully  efficient. 

The  complexity  estimates  of  [PR]  have  been  established  in  the  case  of  well- 
conditioned  input  matrices;  the  algorithms  of  [PR]  output  the  approximate  solu- 
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tions  with  a  sufficiently  high  precision.  On  the  other  hand,  all  the  estimates  have 
been  extended  to  the  case  of  arbitrary  integer  input  matrices  in  [P85a]  using 
some  different  techniques.  In  that  case  the  solutions  are  computed  exactly. 

5.  Parallelization  of  Karmarkar’s  Algorithm  and  Its  Application  to 
Sparse  Linear  Programming. 

In  this  section  we  will  examine  the  cost  of  Karmarkar’s  linear  programming 
algorithm,  [Kj,  and  of  its  improvements  that  use  the  parallelization  and  the 
nested  dissection.  At  first  we  will  reproduce  that  algorithm,  which  solves  the 
problem  of  the  minimization  of  the  linear  function  cTy  subject  to  the  constraints 

ATy  =  0,  V  y.  =  1,  y  >  0,  (7) 

j 

where  y=  [vj,  j=0,l,...,m-l]  and  c  are  m-dimensional  vectors,  AT  is  an  nXni 
matrix,  m  >  n.  y  is  unknown.  This  version  is  equivalent  to  the  canonical  linear 
programming  problem  of  the  minimization  of  cTy  subject  to  ATy  <  b.  y  >  0. 
see  [K]  and  compare  [CJ.  [M].  We  will  designate  e  =  [1,1 . 1]T, 

y(i)  =  [>'oCi)«yi(i) . ym-iU)]. 

D(0)  =  1,  D(i)  =  diag  (y0(i).y,(i) . ym.,(i)),  (8) 

BT  =  BT(i)=  A? 

e 

(All  the  matrices  D(i)  encountered  in  the  algorithm  of  [K]  are  nonsingular.)  The 
algorithm  proceeds  as  follows. 

Initialize.  Choose  c  >  0  (prescribe  tolerance)  and  a  parameter  (in  partic- 
'ular,  3  can  be  set  equal  to  1/4).  Let  y ( 0 )  =  (l/n)e.i=0. 

Recursive  Step.  While  cTy(i)  >  c  do 
Compute  the  vector  y(i+l)  =  q(y(i)),  increment  i. 

Given  vector  y(i).  the  vector  y(i+l)  is  computed  as  follows. 

1.  Compute  the  matrix  B  =  B(i)  of  (8),  that  is,  compute  the  matrix  A1 1 >( i )  and 
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augment  it  by  appending  the  row  eT. 

2.  Compute  the  vector  cp  =  [I  -  B(BTB)_,BT]D(i)c. 

3.  Compute  the  vector  z(i)  =  y(0)  -  /?rcp  /  ||cp||  where  r  =  l/\/m(m-l). 

4.  Compute  the  vector  y(i+l)  =  D(i)z(i)  /  eTD(i)z(i)). 

The  algorithm  also  includes  the  checks  for  infeasibiity  and  optimality,  (see 
[K] ),  but  it  is  easy  to  verify  that  their  computational  costs,  as  well  as  the  compu¬ 
tational  cost  of  the  reduction  of  the  problem  from  the  canonical  form  to  (7).  are 
dominated  by  the  cost  of  computing  the  vector  q(y(i))  at  the  recursive  stops, 
which  is,  in  turn,  dominated  by  the  cost  of  computing  BTB  given  BT  =  BT(iJ  for 
all  i.  [I\]  shows  that  BTB  can  be  represented  as  follows, 

BTB=  [atD2(,)A  0 
O1  m 

so  the  inversion  of  BTB  is  reduced  to  the  inversion  of  ATD*(i)A,  which  in  turn  is 
reduced  to  the  inversion  of  the  matrix  H  of  (2), (3)  where  A  is  replaced  by  D(i)A. 
Furthermore  we  can  see  that  it  suffices  to  compute  the  product  (BTB)'’BD(i)c, 
and  this  amounts  to  matrix-vector  multiplications  and  to  solving  a  system  of 
linear  equations  with  the  matrix  H. 

This  algorithm  of  (K)  requires  O(Lm)  recursive  steps  in  the  worst  case,  so  the 
total  computational  cost  is  0(Lm  C)  where  L  is  the  input  size  of  the  problem  and 
C  is  the  cost  of  computing  q(y)  given  y.  The  algorithm  for  the  incremental  com¬ 
putation  of  the  inverse  of  BTB  of  sect.  6  of  [K]  implies  that  C  =  0(nr5)  for  the 
,dense  A.  It  is  rather  straightforward  to  perform  these  0(nr  5)  arithmetic  opera¬ 
tions  in  parallel  using  0(\/m  log  m)  steps  and  nr/log  m  processors  (and  using 
O(m)  steps,  nr  processors  for  the  initial  inversion  of  ATA).  Applying  the  matrix 
inversion  algorithms  of  [PR|,  we  may  perform  every  evaluation  of  -yfy)  using 
0(log  m  +  log2n)  parallel  arithmetic  steps  and  (M(n)/log  n)(l+m/(n  log  n ) )  pro¬ 
cessors,  so  we  arrive  at  the  following  trade-off  for  the  estimated  total  arithmetic 
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cost  of  Karmarkar’s  algorithm,  0(m,  5L)  steps,  m2  processors,  that  is,  0(m35L) 
arithmetic  operations,  (via  the  straightforward  parallelization),  or 
0(m  L(log  m+log2n))  steps,  (M(n)/log  n)(l+m/(n  log  n))  processors,  that  is, 
0(m  L  M(n)(log  m+log2n)(l+m/(n  log  n))/log  n)  arithmetic  operations,  (via  the 
parallel  matrix  inversion  algorithms  of  (PH)). 

In  both  cases  the  sparsity  of  A  is  not  exploited.  In  particular  the  algorithm 
for  the  incremental  computation  of  the  inverse  suggested  in  sect.  6  of  [K]  does 
not  preserve  the  sparsity  of  the  original  input  matrix.  This  causes  some 
difficulties  for  the  practical  computation  for  the  storage  space  increases  substan¬ 
tially.  Thus  the  special  methods  of  solving  sparse  l.i.s.p.,  such  as  the  conjugate 
gradient,  the  Lanczos  and  the  nested  dissection  methods,  (see  [B],  [GL]  and  this 
paper),  become  competitive  with  (if  not  superior  to)  the  latter  algorithm  of  sect. 
6  of  [K].  If  the  matrix  A  is  such  that  the  graph  G=(Y\E)  of  the  matrix  II  o  f  (3) 
has  an  s(m+n)-separator  family  and  s(m+n)=o(m-t-n),  then  the  nested  dissection 
method  can  be  strongly  recommended.  Specifically  in  this  case  we  arrive  at  the 
estimates  of  0(Lm(  |E|  4-  M(s(m+n)))  arithmetic  operations  by  combining  (K) 
and  [LRT],  see  our  sect.  3,  and  of  0(Lm  log  m  log2s(m-(-n))  parallel  arithmetic 
steps  and  0(  |  E  |  +M(s(m+ n))/log  s(m+n))  processors  by  combining  [!\]  and  the 
parallel  algorithm  of  this  paper.  The  reader  could  better  appreciate  this  improve¬ 
ment  due  to  the  application  of  nested  dissection  if  we  recall  that 
s(m+n)  =  \/8(m+nj  if  the  graph  G  is  planar  (as  occurs  in  many  operations 
research  applications,  for  instance,  in  the  problem  of  computing  the  maximum 
flow  in  a  network  having  an  s(m+n)-separator  family).  Then  the  processor  bound 
for  computing  the  recursive  factorization  (6)  is  less  than 
2s3(m  +  n)  =  8  \/S  (m+n)1  5  and  the  total  number  of  arithmetic  operations  is 
0(m  L(m+n)*'3)  in  that  case.  The  premultiplications  of  A  by  the  nonsingular 
matrix  D(i)  do  not  change  the  separator  sets  for  the  graph  G,  so  these  sets  are 
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precomputed  once  and  for  all,  which  is  an  additional  advantage  of  using  the 
nested  dissection  in  this  case. 

Appendix.  Current  Computational  Cost  of  Solving  the  Linear  Program¬ 
ming  Problem. 

We  will  start  with  the  table  presenting  estimates  for  the  computational  cost 
of  one  iteration  of  the  simplex  and  Karmarkar’s  algorithms  for  the  l.p.p.  having  a 
dense  mXn  input  matrix  A,  compare  [P85b],  [PH]  and  sect.  5.  We  will  restrict 
our  analysis  to  the  cases  where  n  <  m  =  O(n). 


Table  1. 


arithmetic 

parallel 

operations 

steps 

processors 

1-st  iteration  of  [K] 

0(m3) 

0(log2m) 

m3/log  m 

average  over  n 

0(m25) 

0(  Vm  log  m) 

o 

nr 

iterations  of  [K] 

any  iteration  of  revised 

O(rrr) 

O(m) 

m 

simplex  algorithms 

There  is  a  certain  controversy  about  the  current  upper  estimates  for  the 
number  of  iterations  in  the  two  cited  algorithms.  The  worst  case  upper  bounds. 
O(Lm)  for  [K]  and  2m  for  the  simplex  algorithms,  greatly  exceed  the  number  of 
iterations  required  where  the  same  algorithms  run  in  practice  or  use  random 
input  instances,  (similarly  for  the  algorithms  of  [P8 1,85],  see  [MP] ).  This  uncer¬ 
tainty  complicates  the  theoretical  comparison  of  the  effectiveness  of  the  two  algo¬ 
rithms.  However,  some  preliminary  comparison  can  be  based  on  the  partial 


information  already  available.  In  particular  let  us  assume  the  empirical  upper 
bound  0(n  log  m)  on  the  number  of  iterations  (pivot  steps)  of  the  simplex  algo¬ 
rithms,  cited  by  some  authors  who  refer  to  the  decades  of  practical  computation, 
see  [C],  pp.  45-46,  [M],  p.  434.  The  bound  implies  that  a  total  of  0(m3logm) 
arithmetic  operations  suffice  in  the  simplex  algorithm  vs.  0(m3)  used  already  in 
the  first  iteration  of  [K].  Moreover  there  are  special  methods  that  efficiently 
update  the  triangular  factorization  of  the  basis  matrices  used  in  the  simplex  algo¬ 
rithms,  which  further  simplifies  every  iteration  of  the  simplex  algorithms  in  the 
case  of  sparse  input  matrices,  see  [C],  ch.  7,24,  [M],  ch.  7.  On  the  other  hand,  if 
appropriate  modifications  of  Karmarkar’s  original  algorithm  indeed  run  in  a 
polylogarithmic  number  of  iterations  (as  he  reported  on  the  TIMS/ORSA  meet¬ 
ing,  Boston,  May,  1985,  and  on  the  12-th  International  Symposium  on  Mathemat¬ 
ical  Programming,  Boston,  August,  1985),  this  would  immediately  imply  a  sub¬ 
stantial  acceleration  of  the  simplex  algorithm  at  least  in  the  cases  of  i)  parallel 
computation  and  dense  input  matrices  (see  Table  1),  and  ii)  both  parallel  and 
sequential  computations  in  the  cases  where  the  graph  associated  with  the  matrix 
H  of  (2)  has  an  s(m+n)-separator  family,  s(m+n)  =  0((m+n)q),  q  <  1.  (see  the 
estimates  of  our  sect.  5). 
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